首页> 外文OA文献 >Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance
【2h】

Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance

机译:用50种语言比较50种自然语言和12种基因语言   词语嵌入语言发散(WELD)作为一种定量测量   语言距离

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We introduce a new measure of distance between languages based on wordembedding, called word embedding language divergence (WELD). WELD is defined asdivergence between unified similarity distribution of words between languages.Using such a measure, we perform language comparison for fifty naturallanguages and twelve genetic languages. Our natural language dataset is acollection of sentence-aligned parallel corpora from bible translations forfifty languages spanning a variety of language families. Although we useparallel corpora, which guarantees having the same content in all languages,interestingly in many cases languages within the same family cluster together.In addition to natural languages, we perform language comparison for the codingregions in the genomes of 12 different organisms (4 plants, 6 animals, and twohuman subjects). Our result confirms a significant high-level difference in thegenetic language model of humans/animals versus plants. The proposed method isa step toward defining a quantitative measure of similarity between languages,with applications in languages classification, genre identification, dialectidentification, and evaluation of translations.
机译:我们介绍了一种基于词嵌入的语言之间距离的新度量,称为词嵌入语言差异(WELD)。 WELD被定义为语言之间单词的统一相似性分布之间的差异,通过这种方法,我们对50种自然语言和12种遗传语言进行了语言比较。我们的自然语言数据集是来自句子翻译的平行语料库的集合,这些语料库来自五十种涵盖多种语言家族的语言的圣经译本。尽管我们使用并行语料库,以确保所有语言中的内容相同,但有趣的是,在许多情况下,同一家族集群中的语言会聚在一起。除了自然语言,我们还对12种不同生物(4种植物)的基因组编码区进行语言比较,6只动物和2个人)。我们的结果证实了人类/动物与植物的遗传语言模型之间存在显着的高级差异。所提出的方法是朝着定义语言之间相似性的定量度量迈出的一步,并将其应用于语言分类,体裁识别,方言识别和翻译评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号